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Abstract 

An important property of an intelligent robot is to be able to determine the location of an object in 
3-D space. This paper proposes a general object localization system structure, discusses some impor- 
tant issues on localization and gives an overview of current available object localization algorithms 
and systems. The algorithms reviewed in the paper are characterized by their feature extracting & 
matching strategies, the range finding methods, the types of locatable objects and the mathematical 
formulating methods. 

1 Introduction 

Since the early 1980’s, the use of robots in industry has become increasingly popular or even crucial in 
some areas. Through practice, researchers have realized that an important property of an intelligent 
robot is the ability to determine the location of stationary and moving objects. For example, space 
station construction, repairing, maintenance, satellite refueling, etc. have been identified as the 
potential areas of applications of telerobotics systems. The availability of an efficient means of locating 
objects is one of the key factors to the success of developing such systems. 

Object localization has long been defined as a part of object recognition process in computer vision 
research [4]. But in most instances, the emphasis of the research is on object recognition. Object 
localization is only a by-product. In robotic applications, however, object localization usually is the 
ultimate goal. And, it has many of its own problems to be solved such as real-time considerations, 
accuracy issues, types of locatable objects, working conditions etc., which object recognition research 
generally does not address. In some systems, “locate” has been defined as one of the basic indepen- 
dent operations the telerobot system is to perform [27]. As a result, object localization research has 
attracted increasing attention recently. 

This paper will give an overview of the three-dimensional object localization problem. First, it 
provides a closer examination of the problem and then proposes a general object localization system 
structure. Some important issues of object localization and the possible implementations of key 
components of the proposed object localization system are discussed and compared. A summary is 
presented in the final section. 


2 The Object Localization Problem 

As we have mentioned, object localization is the determination of the location of an object. What 
must be solved when a robot vision system is trying to locate an object? A necessary component 
of every intelligent robot system is the world modeling system which stores, among other things, a 
representation of all the object models that are relevant to the robot s operation and a definition of 
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the sensing coordinate system. To specify an object in the world modeling system, there must exist 
either an implicit or explicit coordinate system which is associated with that object. 

The real problem to be solved when locating an object, therefore, is to determine the relative 
location between the sensing coordinate system and the coordinate system for the object, which 
is somewhat different from what the object localization means from the point of view of object 
recognition. In object recognition, localization can mean a description of the location relationships 
among the objects to be recognized. 

The relative location of two coordinate systems can be specified in any one of the following 
methods: 

1. Position and orientation: 

The position can be specified by three parameters, e.g., the (x,y,z) coordinates of the origin of 
the object coordinate system relative to the sensing coordinate system. There are three different 
representations of orientation: 

• Three Eular angles a,/?, 7, or angles about the coordinate axes. 

• A unit vector r and an angle 9. 

• A quaternion. [22, 26] 

2. A 4 X 4 homogeneous transformation matrix. 

3. Dual number quaternion: [37] 

This is an extension of quaternion representation in which each quantity is changed to a dual 
quantity [11]. The dual quaternion has a similar interpretation as the real quaternion: 

. _ sin(0/2)n 

^ cos(0/2) 

where the vector h is a unit line vector about which the coordinate system has rotated and 
translated and 9 is the dual angle of rotation and translation. 

The objective of the object localization algorithm is to compute the parameters which specify the 
corresponding representation. 

Because there are six degrees of freedom a rigid object could have in three dimensional space 
(three for position, three for orientation), six independent parameters is the minimum number to 
be determined. The advantage of position /orientation representation is that it has minimum or 
near minimum number of variables. But there are disadvantages. The angular representations use 
trigonometric functions which are of infinite order and lead to a nonpolynomial criterion. Vectorial 
representations also have singularities; when the rotation angle 9 is zero, the axis of rotation is 
arbitrary. The matrix representation is linear and has no singularity problem. But the inherent 
redundancy for rotation leads to a high-dimensional space constraints and will make the computation 
a little harder. The dual number quaternion representation has a dimension of eight, which is a little 
bit higher than the minimum but is still quite simple to compute. 

In some applications, the localization problem can be simplified due to extra constraints imposed 
on the object. For example, if an object is so constrained that a planar surface of the object is always 
lying on a plane, the degrees of freedom of the object are reduced to three: one for the rotation and 
two for the translation. 

How does one solve for these position /orientation parameters if sensor data and object models are 
given? Usually the computation is carried out by a matching process. That is, the object localization 
algorithm will try to find a “best” transformation which will put sensed features into its corresponding 
model features. 

From the above description, it is not difficult to imagine that a general object localization system 
should contain the following components: (1) sensing system: to provide necessary measurements; 
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Figure 1: Object localization system organization 

(2) world model: to represent all the objects including robot and sensors and their relationships; (3) 
feature extraction: to retrieve features which is to be used in the matching process; (4) matching: 
to try to pair the sensed features with corresponding model features; and (5) computing: to calculate 
the transformation parameters. See Fig. 1 for the configuration. 

Based on the types of sensors used in the sensing system, the sensing method can be divided 
into serial sensing and parallel sensing. If the necessary sensor data is obtained through a series of 
measurements, such as the case when a spot range sensor is used, it is called as serial sensing; if the 
sensor data can be obtained by a single measurement, it is called as parallel sensing. In the case of 
serial sensing, the whole feature extraction might go through a repeated sensing-extraction process. 
The geometric features to be extracted and to be matched can be classified as low-level features and 
higher-level features. Possible low-level features include points, vectors, line segments, axes, surface 
patches, edges, boundaries and etc.. Possible high-level features include straight dihedrals, circular 
dihedrals [9], principle directions of surface curves, minimum, maximum and mean curvatures of 
surfaces, Gaussian curvatures, and etc.. Usually the lower the level of features is, the greater the 
number of features to be extracted. 

The matching process is the process of finding the pairings of sensed features and the model 
features. Depending on the level of intelligence of the system, the matching could be done in different 
ways. On the lowest level, there is no matching process at all in the system. Whenever a measurement 
is talcing place, either a default matching is assumed or a man-assisted matching is provided. On 
telerobotics systems, for example, the teleoperator might interactively assist the model matching by 
indicating with a light pen which features in the image (e.g. edges, corners) correspond to those in a 
stored model [1]. On higher levels, the system will be able to paring the features automatically. Table 
1 shows some known feature matchings which have been used in literature to derive the location of 
an object. Sometimes, a combination of feature matchings are necessary to completely specify a rigid 
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Measured features 

Matched to 

Point 

Point 


Planar surface 


Surface patch 

Surface normal 

Surface normal 

Line segment 

Line segment 


Planar surface 


Surface patch 

Edge 

Edge 

Planar surface 

Planar surface 

Quadric surface 

Quadric surface 

Gaussian curvature 

Gaussian curvature 


Table 1: Known Matching Strategies in Object Localization 


transformation . 

We have just showed and discussed a general object localization mechanism. There is no common 
solution for the implementation now. Each component could be implemented in many different ways. 
Some components or relations in this mechanism may be unnecessary in certain implementations. In 
the next two sections , a further discussion about sensing system and feature extracting k matching 
strategies will be given. 6 


3 Some Issues 

We just showed a general structure of object localization systems. In practice, there are some impor- 
tant issues which must be considered when a real localization system is to be designed. 

1. Real-time execution : Hierarchical control structure has been defined as a standard for teler- 
o ot control system architecture [1] and has been adopted by researchers to develop individual 
telerobot systems such as systems developed at Goddard [27], University of Michigan [36] and 
etc.. The functions of vision system are different at each level. So are the requirements for the 
object localization algorithms. Usually the higher the level, the slower the completion rate. See 
table 2 for typical completion rates at each level of telerobot control. 

At the object task planning level, for example, one of the functions of vision system is to 
recognize the environment. The object localization system, as a part of the vision system, is 
used to give an approximate measurements of the locations of the objects in the environment. 
The execution time is in the minute range. At the E-move level, however, the rate of completion 
is in the range of seconds. If a visual-feedback control strategy is used here, the localization 
system has to generate updated measurements for the control system to adjust the robot’s 
movement in the same time frame. Real-time issue will become important. Based on different 
timing requirements, the strategies of localization might be also different. 

2. Accuracy. Accuracy is another important issue in object localization. There are two definitions 
of accuracy, e.g., absolute accuracy c and relative accuracy Ac. 

• Absolute accuracy is defined as the difference between a measured value m and it’s real 
value s. That is, c = m - s; 
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Average rate of 

Average 

Planning 


change in output 

replanning interval 

horizon 

Servo 

1 KHz 

1 millisec. 

15 msec. 

Primitive 

62Hz 

16 millisec. 

300 msec. 

E-Move 

8 Hz 

128 millisec. 

2 sec. 

Object /task 

1 Hz 

1 second 

30 sec. 

Service Bay 

.1 Hz 

10 second 

> 10 min. 

Mission 

0.01 Hz 

1.7 minutes 

> 1 hour 


Table 2: The rate of subtask completion at each level of hierarchy. ([!]) 


• Relative accuracy is defined as the difference between a measured difference and it’s real 
difference. For example, if the real difference between two points is A p and the two 
measurements on the two points are 81 , 82 , then the measured difference is a function 
of Si and S2, e.g., A p = /(si,s 2 ), and the relative accuracy of the measurements is 
Ae = A p - A p. 

High accuracy, especially high absolute accuracy is not always required. For example, accuracy 
is not crucial at the beginning of an assembly task, but will be a determining factor in the final 
stage of operation. Even at that time, the determining factor is relative accuracy rather then 
absolute accuracy. 

Absolute accuracy to a large extent depends on the accuracy of the sensing system. But this 
does not mean that we do not need a good localization algorithm. A good algorithm should 
be insensitive to measurement noises, object distortions and other factors which could influence 
the accuracy of the localization. 

The achievement of high relative accuracy, on the other hand, does not necessarily depend on 
high accurate sensing systems. Human eyes, for example, are not good at locating objects in 
the absolute sense, but human have no difficulty picking up an object. Research is shown also 
proved this point of view [25]. Therefore, when designing an algorithm, one must evaluate its 
performance according to both it’s absolute accuracy and relative accuracy, which has been 
neglected by some researchers. 

3. The type of locatable objects : It is best if the system can locate arbitrary-shaped objects. If 
this is difficult, an alternative method is try to find specific detectable features for each object 
and store these features in that object model for feature-extraction and matching in localization 
process. If such features do not exist for some object, then one should try to make special marks 
on the object. Therefore, some guidelines should be given in the component design stage so that 
the design is favorable to part grasping and localization by the telerobot system. Sometimes, 
very simple modifications made on the part design can greatly improve part localization process. 

4. Sensing system : What types of sensing techniques should be used in a localization system? 
Where should one install the sensing system? How is the dynamic range of sensing system 
determined? These are just some of the issues when one needs to design a sensing system. 
Javis [20] has presented an early overview of range finding techniques. Each technique has its 
advantages and disadvantages. Image-based sensing provides complete information about the 
environment but takes time to process it. Sparse data can be used directly for fast localization 
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purpose but needs a good sensing system for fast data acquisition. Multi-spot sensing is an 
example of such system [21]. For the installation, perhaps some sensors should be installed 
in fixed locations while others can be put into the robot’s moving parts. Newly developed 
technologies should be used in localization systems. For example, motorized-zoom and auto- 
focus techniques can improve the dynamic range of measurements; VILS techniques can reduce 
the size of the whole sensing system. The use of the advanced techniques will have great impact 
on the design of object localization system. 


4 Feature Extraction & Matching Strategies 

As we said, the object localization process basically is a feature matching process. That is, finding a 
best estimate of transformation parameters which will align some modeled object features with certain 
(perhaps different type of) measured object features. Based on how feature matching is realized, the 
object localization algorithms can be broadly divided into two categories: the algorithms which do 
not involve any recognition process, and those which have more or less recognition process involved. 
We call these two types of algorithms as direct-localization algorithms and recognition-localization 
algorithms respectively. 

Obviously the second type of algorithms has a higher intelligent level than that of the first ones. 
Even within the second type of algorithms, the intelligent levels could be different. Some of them can 
establish the matchings within one object, some of them can do it within a group of the same type 
of objects, others can match the features within a group of different types of objects. At the highest 
level, the algorithm could locate unmodeled objects. To do this, a set of primitive features should 
be specified in a database, which will form the basic frames of any object to be constructed. Before 
localizing the unknown object, the algorithm must explore the object and establish a model for the 
object using the set of primitives. 

Each type of algorithm can be further classified according to their sensing methods, the types of 
features used for matching, mathematical formulating methods, the types of locatable objects and so 
on. 


4.1 Direct-Localization 

Direct-localization algorithms are mostly used in the situations where either the working environment 
is a highly-structured or the position relationships among the objects in the environment have pre- 
viously been established proximately or the human beings could provide the assistance as where to 
take the required measurements. The telerobotics applications in most space programs meet these 
requirements. 

Because no recognition is involved, the localization process is quite simple. The extracted features 
and model features can be used as inputs for direct computation. The time of localization depends 
on the time spent on measurements and feature extraction. 

One method proposed by Gunnarsson and Prinz [18, 19] is based on their observation that if a 
set of points are measured and these measurements are distributed on the object surfaces, the best 
transformation is the one which will make the sum of distances between each measured point and 
it’s corresponding transformed surface minimal. Their idea leads to a point-surface matching strat- 
egy. Their algorithm, when formulated in mathematical terms, becomes a least squares minimization 
problem and can be used to locate arbitrarily-shaped objects. Usually an iterative numerical proce- 
dure is needed to solve for the problem. The numerical procedure they used is a modified Lagrange 
multiplier and Newton-Raphson method. Because a good initial guess can be provided due to the 
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fact that the object’s approximate location is supposed known, the convergence of the algorithm is 
guaranteed in most cases. 

Gordon and Seering [17] developed a system which uses striped-light and camera sensing to gather 
necessary range data. The system can only locate planar objects. Line-surface matching is used in 
their algorithm. The striped-light when projected on the planar surfaces of the object generates 
straight-line segments. The scene is then viewed by a camera. The equation of each line segment can 
be obtained by analyzing the corresponding image of that line segment viewed by the camera. Three 
independent line segments are needed to compute the rotation and translation parameters. The fact 
that the line vector is perpendicular to the rotated modeled surface normal vector can be used to 
derive the rotation. The algorithm uses quaternions to represent rotation and uses a numerical method 
to compute it. They also give a closed-form solution for the rotation when three surfaces sensed are 
perpendicular with each other. The calculated rotation is then used to compute the translation. 

The same striped-light and camera sensing system is also used by Rutkowski, Benton and etc. 
[3, 28]. But their matching strategy is point-surface matching. In their algorithm, the measured 
points are from extracted line-segments, either straight or curved. Their method imposes no particular 
constraints on the shapes of the object surfaces, as long as the object surfaces can be partitioned into 
a collection of primitive surfaces, such as planes, cylinders, or spheres. The computation is carried 
out by a repeated location adjustment. The location adjustment is expressed by three quantities: 
the rotation center, rotation axis and translation vector. To guarantee a fast convergence of their 
algorithm, the center of mass data points is chosen as the rotation center instead of the origin of the 
model’s coordinate system. 

In above methods, if the object is a polygon, at least three surfaces need to be accessed in 
order to take enough measurements. Shao, Volz and etc. [29] have implemented an algorithm based 
on line-segment line-segment matching, which needs to access only one surface when localizing a 
planar object. Their algorithm can locate object which has planar surfaces, quadric surfaces and 
revolutionary surfaces. A line range sensor is used to extract line-segment parameters. The line- 
segments are either boundary edges or axes. The extraction of only two line-segments are enough to 
locate an object. Closed-form formulas are used to compute the position and orientation parameters. 

When comparing with these methods, we can find out that all of them have very high measurement 
accuracy and fast execution speed. For example, Gordon’s system has 2.5 seconds of execution time 
and a relative accuracy of 0.002 inches in translation and 0.1 degrees in rotation when a two inch cube 
is being located, and is capable of reliably assembling components with little clearance without using 
force controlled motion. In Gunnarsson’s algorithm, the measurement error is on the same order of 
magnitude as the sensor error. These algorithms also have some problems. The problem associated 
with stripped-light sensing is that it requires extra light source with special pattern, which sometimes 
is inconvenient. The use of spot sensor or line sensor has the problem of multi-measurement, e.g., the 
sensor has to be installed on the robot’s moving part and be moved together with the robot in order 
to take multi-measurement. This will slow down the localization process. 

High-level features can also be used to locate objects. For example, Thorne and etc. [35] described 
an algorithm which uses features such as the radii or curvatures of a space curve along the curve to 
locate an object. The curvatures k or radii p of a space curve can be expressed as a function of the 
length s of the curve, e.g., k = «(s) (or p = p(s)), which is independent of the coordinates of the 
curve and is thus invariant under rotation and translation. The algorithm assumes that there exists a 
particular feature line or fingerprint for each object. The feature line could be a certain portion of the 
curved edge(s) of the object or a curve on the object surface. A curvature plot along the feature line 
can be drawn. In the database, the feature line is specified by a set of discrete points with each point 
associates with the information about its coordinate (x,y,z), radius of curvature, curvature, delta 
length, and total length. The total length is zero for the first point. The localization is proceeded 
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through point-point matching. The method first measures a set of discrete points along the feature 
line and then finds a corresponding point for each measured point and a least squares optimization 
algorithm is used to find the location parameters. A similar algorithm which uses iso-gaussian (a 
curve connecting points of constant gaussian curvature) matching to localize an object was described 
by Gunnarsson [18]. 

4.2 Recognition-Localization 

In many applications, the objects could be placed anywhere in the environment. Therefore, if a 
measurement is made and some sensed features are extracted, the localization system has no prior 
knowledge about which object or which part of the object the sensed features should belong to. In 
this case, in order to compute the location of an object, a recognition process is needed, which will 
establish the matchings between a set of sensed features and the model features. 

There are two popular matching strategies: tree searching and clustering. 

In tree searching strategy, if there are k sensed features 5,, t = 1 • • • Jfc and l m model features 
Afj, j = 1 — l m for object O m , m = 1 • • - in, a searching tree can be constructed for each known object 
0 m such that the tree has l m levels, and each intermediate node has k branches. Each path from root 
to leaf represents a potential matching. The total number of possible matchings, or the searching 
space for object 0 m is /^, which is very huge. To reduce the searching space, several methods have 
been proposed. 

One algorithm proposed by Grimson, Lozano-Perez and etc. [15] [16] is to use the local geometrical 
constrains such as distance constraint, angle constraint, direction constraint, triple-product constraint 
and so on to reduce the searching space. Beginning from the root of the tree down, at each node, 
local constraint test is made to see if the sensed features up to that level are consistent with these 
constraints. If it is not, the entire subtree is discarded for consideration. 

A similar tree searching method is used in Faugeras and Hebert’s work [12, 13, 14]. Instead of local 
constraints, rigidity is used as the basic constraint during tree search process. Every path from the root 
to an intermediate node (level k for instance) represents a partial matching. The algorithm computes 
a best rigid transformation T* up to that level (k). Then T* is applied to the next unmatched model 
primitive M\ t+i and only those sensed primitives that are sufficiently close to T*Mfc+i are considered. 
The computations are carried out by least squares optimization techniques. As each new pair of 
primitives adds to the partial matching list, the new estimation of transformation has to be started 
over again. The algorithm’s underlying paradigm is “ locating while recognizing ” which is different 
from the paradigm of “ locating after recognizing” used in Grimson and etc .' s algorithm. 

Reducing the number of sensed and model features is another important method to speed up the 
tree searching process. The use of higher level features can effectively reduce the size of the searching 
tree because fewer features are usually adequate. The system developed by Bolles, Horaud and etc. 
[9, 10] is such an example. Three different types of edges are used as the primitive features. They 
are: straight dihedrals, circular dihedrals, and straight tangentials. They are higher level features: 
one pair of matched features can determine all but one of the object’s six degrees of freedom. 

Clustering is another technique used in recognition-localization algorithms. The principle of clus- 
tering is very simple: 

For each element in the sensed feature list 

for each element in the model feature list 

if they are compatible, compute a transformation candidate 
put it into cluster space. 

The cells with the largest counts are expected to represent the location. 
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While the principle is simple, the implementation is not so easy. The high dimensions (six) and 
huge space of clustering are just two difficulties. Different methods have been proposed to accom- 
modate these problems. Three dimensional clustering, the use of proper size of cells and hierarchical 
clustering are some of them [2]. Several systems have been proposed by using the clustering tech- 
nique. Linnainmaa etc. [23, 24, 25], Silberberg, Harwood, etc. [31], and Stockman etc. [32, 33, 34] 
are typical examples. One property of clustering is the algorithm’s parallel structure, which will have 
an important impact on the future development of object localization algorithms. 

In most algorithms, the least squares optimization is the mathematical tool to estimate the best 
transformation if many feature-pairs are found. But in many situations, this method is not the only 
tool. Bolle, Cooper [5] [6, 7, 8] presented a statistics approach of combining pieces of information to 
estimate 3-D complex-object position. They formulate the optimal object localization as a Bayesian 
probability estimation problem. The objective is to find the most likely transformation T that maps 
the model primitives onto the measured range data. The likelihood p(T|T) should be maximized 
with respect to T, where Y is the measurement data. If k primitives have been extracted from 
range data and matched to model primitives, then p(T|T) = II5fc=i P(Tfc|T). That means, to arrive a 
global optimal solution, the maximum likelihood estimation has to be applied locally. Based on this 
analysis, they arrived at a different formula for minimizing the estimation error from the traditional 
least squares optimization formula. To arrive an optimal solution, a through analysis of measurement 
errors and having a good error model are needed. 


5 Summary 

We have discussed several object localization methods and strategies. Different levels of telerobot 
control have different requirements on the localization system. At the low level, the consideration 
of real-time execution and high accuracy is important. At higher level, the use of AI (artificial 
intelligence) technology becomes crucial. It seems that a lot of work has to be done in order to 
develop a real practical localization system. The issues discussed in section 3 are just few of those 
which need to be addressed by future research. 
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